Search
Details

COVID-19 Outbreak

This short report analyzes some of the data related to the COVID-19 pandemic. This page is a longer and detailed version of the summary.

For the visualization, we have developed a simple Python module (ongoing work) with a set of functions that can be used load and plot the data without digging too much into them.

The module is available in the repository (hedera_covid.py), the data as well. We periodically update the datasets with those available online from the repository by Johns Hopkins CCSE

# for plotly
from plotly.offline import iplot
from plotly.offline import init_notebook_mode, plot
from IPython.core.display import display, HTML
import plotly as py
import plotly.tools as tls

import numpy as np
from hedera_covid import DataHandler, plot_death_rate, plot_daily_cases, plot_confirmed_cases

# load data
path_confirmed = '../../Data/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
path_death = '../../Data/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'

covid_data = DataHandler(data_confirmed_path = path_confirmed,
                         data_death_path = path_death)

covid_data.confirmed.head()
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 ... 4/2/20 4/3/20 4/4/20 4/5/20 4/6/20 4/7/20 4/8/20 4/9/20 4/10/20 4/11/20
0 NaN Afghanistan 33.0000 65.0000 0 0 0 0 0 0 ... 273 281 299 349 367 423 444 484 521 555
1 NaN Albania 41.1533 20.1683 0 0 0 0 0 0 ... 277 304 333 361 377 383 400 409 416 433
2 NaN Algeria 28.0339 1.6596 0 0 0 0 0 0 ... 986 1171 1251 1320 1423 1468 1572 1666 1761 1825
3 NaN Andorra 42.5063 1.5218 0 0 0 0 0 0 ... 428 439 466 501 525 545 564 583 601 601
4 NaN Angola -11.2027 17.8739 0 0 0 0 0 0 ... 8 8 10 14 16 17 19 19 19 19

5 rows × 85 columns

First of all, we load the data create an object of type DataHandler. This class can be used to perform internally operation on the dataset, so that data are afterwards ready and clean for plotting.

Initialize list of countries

Now, it is possible to get data of a country to the list that we want to look at. This can be done using

my_country = covid_data.get_country('Name of the country')

The Name of the country is the name given in the third column (Country/Region) of the covid_data dataframe.

For example:

italy = covid_data.get_country('Italy')
italy.keys()
dict_keys(['name', 'dates', 'confirmed', 'deaths', 'daily_new_cases', 'daily_deaths', 'start', 'start_death'])

We can also create a list of countries

covid_data.add_country('Name of the country')

that we can analyze together.

my_countries = ['Italy','Spain','Germany','Austria','France','United Kingdom','US','Sweden','Netherlands']

for c in my_countries:
    covid_data.add_country(c)

Starting from this object, we can now use some functions implemented in our module to visualize the data

Note: (this is an ongoing work!)

Number of confirmed cases

The following figure shows the number of confirmed cases. Note that this is extremely dependent on the different testing protocols and capacity of different countries.

Total

First, we have to collect the data. For the reported cases we use this function:

data = covid_data.get_confirmed_data(start_date=0,n_smooth=7,rescale=True)

Parameter:

  • start_date to start plotting from a particular day (later than January 22)
  • n_smooth to smooth the data (7 is usually good)
  • rescale if you want to rescale curves to the same start (when the number of infected reached 100 in the corresponding country). In this case start_date won't be used.

Then we can create a plotly bar chart (for example) and display it.

init_notebook_mode(connected=True)

fig = {
    "data": data,
    "layout": {"title": {"text": "Confirmed Cases (rescaled in each country)"}}
}

plot(fig, filename = 'figure.html')
display(HTML('figure.html'))

Daily variation

Under the assumption that the number of reported cases is a representative of the total number in each conuntry, looking at the daily new cases can give an idea of whether the countries are flattening the curve.

For this, we can use a function that gather these data for the selected countries.

Parameters:

  • start_date: day where the plot start (0 = January 22)
  • n_smooth: smoothing of the data (data will be averaged over n_smooth days, 7 is usually good)
  • rescale: set True if you want to rescale curves to the same start. In this case start_date won't be used.

We use rescale = True: this means that the curves start when the number of infected (reported) in each country reached 100.

data = covid_data.get_daily_confirmed_data(start_date=0,n_smooth=7,rescale=True)
init_notebook_mode(connected=True)

fig = {
    "data": data,
    "layout": {"title": {"text": "Daily Cases (rescaled)"}}
}

plot(fig, filename = 'figure.html')
display(HTML('figure.html'))

Mortality Rate

We can look at the official mortality rate over time for a selected set of countries

data = covid_data.get_death_rate_data(start_date=30,n_smooth=0,rescale=False)
init_notebook_mode(connected=True)

fig = {
    "data": data,
    "layout": {"title": {"text": "Official mortality rate: # Death/# Confirmed"}}
}

plot(fig, filename = 'figure.html')
display(HTML('figure.html'))
#Parameters:
#* `countries`: a list of countries
#* `start_date`: day where the plot start (0 = January 22)
#* `n_smooth`: smoothing of the data (data will be averaged over `n_smooth` days, 7 is usually good)
#* `rescale`: set `True` if you want to rescale curves to the same *start*. In this case `start_date` won't be used.
#* `log_scale`: set to `True` to scale use logarithmic scale for the *y*-axis
#   
#plot_confirmed_cases(covid_data.countries,start_date=30,n_smooth=0,rescale=False,log_scale=True)